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Background: The No Child Left Behind (NCLB) Act of 2001 leaves much of the details of 
establishing and implementing an accountability system up to the individual states, with the 
requirement for the system to be valid and reliable (NCLB, 2002). Determination of how well 
schools and districts are doing is based largely on the classification of student performance on 
standardized assessments. However, making classification decisions about students has been 
part and parcel of research on testing long before NCLB was implemented. Some early work on 
error in such classification systems considered the problem with a loss function framework 
(Traub & Rowley, 1980). Other work included focused studies of misclassification using 
alternate test forms (Livingston & Lewis, 1995) or the application of generalizability theory 
(Brennan & Kane, 1977). Large sample sizes were demonstrated to produce more accurate and 
precise classification (Yen, 1997). 

The reliability of a classification of student performance necessarily impacts AYP 
determinations for groups, be it subgroups within schools, schools themselves, or districts. In 
implementing NCLB, individual states considered the error associated with an aggregated 
decision and the importance of group size. In general, state-specific treatment of classification 
error proceeded along one of two routes. About 80% of states use conventional confidence 
intervals (Cl) to express uncertainty, while some states use the standard error of measurement 
(SEM; U.S. Department of Education, 2010). The estimate of AYP and these approaches to treat 
misclassification will be compared to an alternate estimator that will be described below. 
Purpose: The purpose of this study is to develop an estimate of Adequate Yearly Progress 
(AYP) that will allow for reliable and valid comparisons among student subgroups, schools, and 
districts. A shrinkage-type estimator of AYP using the Bayesian framework is described. Using 
simulated data, the performance of the Bayes estimator will be compared to currently-used non- 
Bayes estimator. While it is likely that NCLB will either experience a major overall or go away 
altogether, decisions will still have to be made about children’s academic performance, and these 
systems will likely continue to use assessment performance as an indicator of academic 
performance. The estimator developed here has application to such accountability measure. 
Significance: Differences in group sample sizes pose a difficult challenge to reliable estimates 
of AYP. For example. Table 1 shows the distribution of school district sizes in the state of 
Michigan. Figure 1 shows an example of how variable public school students’ scores are in 
districts across the state. Highly variable district sizes make the direct estimators heterogeneous, 
which in term compromises their value for drawing conclusions and making inferences. 
Variability such as that shown here can be properly reflected when a so called “shrinkage” 
estimator is used. This estimator is an alternative to the commonly used “direct” estimator (the 
estimate is based on information specific only to the district or subgroup considered). A 
shrinkage-type approach has been used in hierarchical models (e.g., Raudenbush & Bryk, 2001) 
to estimate the relationship between variables while taking account of the different levels of 
observed data (e.g., students, classrooms, schools). A shrinkage-based technique for AYP 
determination would allow for comparisons of AYP across schools and districts. 

(Please insert Table 1 here) 

(Please insert Figure 1 here) 

Statistical Model: Below, a Bayes estimator of AYP conformance is described and compared to 
a currently used estimator. Prior to the development of these estimators, the general procedure 
of AYP determination is described. Given the latitude NCLB gave to states about the procedure 
for determining AYP, it’s important to note that the procedure described below is not that used 
by all states, but is merely meant to serve as a representative example. The procedures 
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implemented by the state of Michigan are occasionally used to help illustrate a point. Regardless 
of the specific procedure used by a state, the estimators described below are applicable. 

For each i =1, 2, • • • ,n, let school i comprise m, students and a proportion p,- of students 
who are proficient; a student is declared proficient if his/her score on a particular test is equal to 
or above the State specific target. Thus, 1 00p, is the percent proficient (see Michigan Department 
of Education, 2011) for the i-th school. This proportion p, is unknown and is estimated for school 
i using p i , the observed proportion of students meeting the AYP criteria. Since a student’s score 

on a test is typically a random variable, it follows that p j has measurement error built in 

depending on all such students’ scores, and thus, is also a random quantity. AYP classification is 
a high-stakes decision and it is important to estimate each p ; as reliably as possible by taking into 
account all possible sources of uncertainty. 

More specifically, let p ik be the observed proportion of proficient students in the k-th 
subgroup in school i, for k =1, 2, ..., K corresponding to the grade levels in school i. Assume that 
the number tested is m ik , the true proportion of proficient students is /?,•*, and the target is p° k for 
the k-th subgroup. The AYP score for school i is determined as: 

AYP score = 100 Y—(p ik ~p° k ), (1) 

*= i m i 

which is then compared to the threshold value of 0. School I is declared to have met the State 
objective if the weighted score is equal to or above 0. More generally, the above procedure is 
applied separately to the different subjects (i.e., reading, mathematics) and the subject- specific 
AYP score is compared to 0. Most states use a combination rule to determine whether the school 
meets AYP (for an example of such a rule for Michigan, see Michigan Department of Education, 
2011, for more details). 

Referencing (1), the key statistic used to determine the final AYP score is the proportion 
of proficient students p ik in each subgroup. Also, the mean AYP score is obtained by replacing 
the statistics p ik by its unknown true value p ik . The school is truly proficient if the unknown 
mean AYP score is greater than or equal to 0. Due to uncertainty in the p ik s , there can be two 
types of errors associated with the decision made: A school can be declared to have met AYP 
when the true mean score is below 0 or the school can be declared not to have met AYP when 
the true mean score is above 0. These two errors, called false-positive and false-negative, can be 
minimized if the true but unknown p jk values are estimated accurately. In the subsequent 
discussion, we focus on one p ik = p ( . , noting that the models elicited for one such p, can be easily 
generalized to include multiple p^s. Furthermore, currently practiced procedures any states can 
be suitably adapted on a case by case basis under the proposed general framework. 

To determine whether a school meets AYP or not, many states have adopted the con- 
fidence interval approach (Forte-Fast & Erpenbach, 2004; Marion et al., 2002; U.S. Department 
of Education, 2010). The confidence interval procedure is as follows: Based on the observed 
proportion fr for school i, construct the standard error of the proportion, SE(/T) , and say, the 
95% upper confidence interval for p, using the normal approximation. This upper one-sided 
confidence interval is given by [0,p ( . + z, t)95 SE( p j )], where zo .95 is a z-value such that <S>(zo. 95 ) = 

0.95 for the cumulative distribution function of the standard normal distribution, <t>. This 
approximation is valid only when the student size m, is large; thus, when m, is small, either the 
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estimates are not reported or if reported, the estimates are known to be unreliable for determining 
AYP conformance. Nevertheless, the above confidence interval is compared to the target level, c, 
which is determined by the state. School i meets AYP if its upper tail is above the target level, 
that is, if p, + z 095 SE ( p, ) > c . 

The use of the upper confidence limit, although easy to interpret and justify, presents 
some serious difficulties. When the sample size m, is small, the standard error expression 
SE ( p ; ) is only a crude approximation to the true standard error. In fact, it is much larger than 

the true value resulting in a higher upper confidence limit. The normal approximation is also 
quite unreliable in this case, and thus, the use of zo .95 as a critical value may be suspect. More 
subtly, if this upper confidence criterion is used to determine AYP decisions for n > 1 schools 
simultaneously, it will produce a significant upward bias. This will be true even for large m,- 
values. To demonstrate this last fact, we consider the scenario of estimating the proportion of 

1 " 

schools in a district that meet AYP, 6 = — T, l (Pi- c ) where I is the indicator function which 

n , =1 

takes the value 1 if p L > c , and 0, otherwise. Note that 0 <9 < 1. 

The upper confidence level approach gives rise to an estimate of 9, 9 lJ , given by 

0 u =-Yl(p i + z o , 5 SE(p i )>c). The estimate 0 V has an upward bias. This results from the 
n ,= 1 

introduction of the term z 095 5£’(^,.) which includes some schools that have actual proportion p ,■ is 

less than c. In other words, 9 U will overestimate the true proportion 9, and will favor certain 
schools even when they have not performed at the desired level. Second, if m, is small, the 
confidence intervals could be wider due to large variance for small subgroups, thus accentuating 
the upward bias. In addition, 9,, is subject to higher variability since SE( pf) is again estimated 
(the true value is SE(p ( .) ). Further, SE( pf) itself could be large particularly for small m,. 

Alternative estimators of p, and 9 are motivated from the Bayesian perspective. Assume 
for the moment that m, is large so that p t is approximately normally distributed with mean p, and 

variance of = p ( (1 - p, )/ m , (note that the model below can be extended to encompass the case 
of small nij and other flexible distributions for p, but these extensions are outside of the scope of 
the present study). We consider the hierarchical (multilevel) model given by 

ind 

Pt ~ N(p n o?), and (2) 

iid 

logit(Pi) ~ N(p,r 2 ), (3) 

for i = 1, 2,...,n; in (2), Tmf refers to independent and in (3), ‘iitl ’ refers to independent and 
identically distributed. The logit transformation of p, is logit( p ( ) = log^^j. 

Using Bayes theorem, the posterior of each p, is determined by the above hierarchical 
model specification and is given, up to a proportionality constant, by 

n(Pj I Pi , ju, t) x 7r(Pj I p, ) 7v{p i , r) (4) 
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where n( p t I p s ) = (2 ncr]) 1/2 e ' Pi P,) 1 from the first stage of (4) and 

n{p i I ju,t) = (d\ogit(p j )/dp i )(27TT 2 y l/2 e U ' ! 1 from the second stage of (4). The alternative 

Bayes estimator of 6 is 

(5) 

n 1= i 

Where the probability P* is computed with respect to the posterior distribution of /;„ given /). in 

(4). The utility of this estimator in comparison to the confidence interval estimator is 
demonstrated in the next section. 

Research Design: A simulation procedure will be used to compare the performance of 0 B and 

9 V . The study will produce 500 replicates will be used to compute the probability P in (5) as 
well as the expectation and variance under P using Monte Carlo. The following factors will be 
varied in the study: number of students per school, number of schools per district, and AYP cut- 
off thresholds, c. Two different numbers of students in a school will be considered, the values of 
which will represent a small and a modest number of students. Two choices for the number of 
schools per district will also be considered. And finally, four different values of the true 
proportion of schools in the district meeting AYP will be considered. 

Findings / Results: A simulation procedure was carried out to demonstrate the improvement of 
0 B over 6 V based on the hierarchical model in (2) and (3). The number of students in a school 
were varied to be m = 30 and 200. The number of schools in a district were varied to be n = 10 
and n = 15. The AYP cut-off c was set to 0.7 = 70% and p values for the true proportion of 

schools in a district meeting AYP were, 6 = {0.5, 0.599, 0.705, 0.813}. The prior variance was 

2 

taken to be r =1. The true and observed proportions, /;, and /) , were generated based on the 
above parameter specifications. 

(Please insert Table 2 here) 

Bias, variance, and mean squared error (MSE) were used as indices of performance for 
each of the estimators and the numerical results presented in Table 2. The results indicate the 

superiority of the Bayes estimator 0 B compared to the direct (non-Bayes) estimator^ . The 
inclusion of the confidence interval width led to the increased bias of 9 U , contributing to a higher 
MSE. The i-th term of the estimate 6 U is based on the maximum likelihood estimator p t which 
uses the information only from the i-th school. Thus, p i has high variability, particularly if m, is 
small. On the other hand, the i-th term in the Bayes estimator 0 B is derived from a combination 
of p t and the overall mean p. The overall mean contribution has the effect of reducing the 
variance of 0 B and thereby increasing its stability. 

Conclusions: The proposed Bayes estimator for AYP classification offers an attractive 
alternative to current methods, as measured by performance in terms of bias, MSE, and variance 
reduction. 
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Appendix B. Tables and Figures 



Table 1: Number of public school 4lh graders in school districts in the State of Michigan 



Range of Numbers of Students 


Number of School Districts 


1 - 100 


210 


100 - 200 


124 


200 - 300 


62 


300 - 500 


39 


500-1000 


30 


1000- 


11 



Table 2: Simulation Results 
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Figure 1: Standard deviation of public school students' assessment scores in available 
school districts in the State of Michigan. 
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